Handwritten Number Recognition with TFLearn and MNIST

In this notebook, we'll be building a neural network that recognizes handwritten numbers 0-9.

This kind of neural network is used in a variety of real-world applications including: recognizing phone numbers and sorting postal mail by address. To build the network, we'll be using the MNIST data set, which consists of images of handwritten numbers and their correct labels 0-9.

We'll be using TFLearn, a high-level library built on top of TensorFlow to build the neural network. We'll start off by importing all the modules we'll need, then load the data, and finally build the network.


In [2]:
# Import Numpy, TensorFlow, TFLearn, and MNIST data
import numpy as np
import tensorflow as tf
import tflearn
import tflearn.datasets.mnist as mnist

Retrieving training and test data

The MNIST data set already contains both training and test data. There are 55,000 data points of training data, and 10,000 points of test data.

Each MNIST data point has:

  1. an image of a handwritten digit and
  2. a corresponding label (a number 0-9 that identifies the image)

We'll call the images, which will be the input to our neural network, X and their corresponding labels Y.

We're going to want our labels as one-hot vectors, which are vectors that holds mostly 0's and one 1. It's easiest to see this in a example. As a one-hot vector, the number 0 is represented as [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], and 4 is represented as [0, 0, 0, 0, 1, 0, 0, 0, 0, 0].

Flattened data

For this example, we'll be using flattened data or a representation of MNIST images in one dimension rather than two. So, each handwritten number image, which is 28x28 pixels, will be represented as a one dimensional array of 784 pixel values.

Flattening the data throws away information about the 2D structure of the image, but it simplifies our data so that all of the training data can be contained in one array whose shape is [55000, 784]; the first dimension is the number of training images and the second dimension is the number of pixels in each image. This is the kind of data that is easy to analyze using a simple neural network.


In [3]:
# Retrieve the training and test data
trainX, trainY, testX, testY = mnist.load_data(one_hot=True)


Extracting mnist/train-images-idx3-ubyte.gz
//anaconda/envs/tflearn/lib/python3.5/gzip.py:274: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  return self._buffer.read(size)
//anaconda/envs/tflearn/lib/python3.5/site-packages/tflearn/datasets/mnist.py:52: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  data = data.reshape(num_images, rows, cols, 1)
Extracting mnist/train-labels-idx1-ubyte.gz
Extracting mnist/t10k-images-idx3-ubyte.gz
Extracting mnist/t10k-labels-idx1-ubyte.gz

Visualize the training data

Provided below is a function that will help you visualize the MNIST data. By passing in the index of a training example, the function show_digit will display that training image along with it's corresponding label in the title.


In [4]:
# Visualizing the data
import matplotlib.pyplot as plt
%matplotlib inline

# Function for displaying a training image by it's index in the MNIST set
def show_digit(index):
    label = trainY[index].argmax(axis=0)
    # Reshape 784 array into 28x28 image
    image = trainX[index].reshape([28,28])
    plt.title('Training data, index: %d,  Label: %d' % (index, label))
    plt.imshow(image, cmap='gray_r')
    plt.show()
    
# Display the first (index 0) training image
show_digit(0)


Building the network

TFLearn lets you build the network by defining the layers in that network.

For this example, you'll define:

  1. The input layer, which tells the network the number of inputs it should expect for each piece of MNIST data.
  2. Hidden layers, which recognize patterns in data and connect the input to the output layer, and
  3. The output layer, which defines how the network learns and outputs a label for a given image.

Let's start with the input layer; to define the input layer, you'll define the type of data that the network expects. For example,

net = tflearn.input_data([None, 100])

would create a network with 100 inputs. The number of inputs to your network needs to match the size of your data. For this example, we're using 784 element long vectors to encode our input data, so we need 784 input units.

Adding layers

To add new hidden layers, you use

net = tflearn.fully_connected(net, n_units, activation='ReLU')

This adds a fully connected layer where every unit (or node) in the previous layer is connected to every unit in this layer. The first argument net is the network you created in the tflearn.input_data call, it designates the input to the hidden layer. You can set the number of units in the layer with n_units, and set the activation function with the activation keyword. You can keep adding layers to your network by repeated calling tflearn.fully_connected(net, n_units).

Then, to set how you train the network, use:

net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')

Again, this is passing in the network you've been building. The keywords:

  • optimizer sets the training method, here stochastic gradient descent
  • learning_rate is the learning rate
  • loss determines how the network error is calculated. In this example, with categorical cross-entropy.

Finally, you put all this together to create the model with tflearn.DNN(net).

Exercise: Below in the build_model() function, you'll put together the network using TFLearn. You get to choose how many layers to use, how many hidden units, etc.

Hint: The final output layer must have 10 output nodes (one for each digit 0-9). It's also recommended to use a softmax activation layer as your final output layer.


In [5]:
# Define the neural network
def build_model():
    # This resets all parameters and variables, leave this here
    tf.reset_default_graph()
    
    #### Your code ####
    # Include the input layer, hidden layer(s), and set how you want to train the model
    net = tflearn.input_data([None, 784])
    net = tflearn.fully_connected(net, 128, activation='ReLU')
    net = tflearn.fully_connected(net, 32, activation='ReLU')
    net = tflearn.fully_connected(net, 10, activation='softmax')
    net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')
    
    # This model assumes that your network is named "net"    
    model = tflearn.DNN(net)
    return model

In [6]:
# Build the model
model = build_model()


WARNING:tensorflow:From //anaconda/envs/tflearn/lib/python3.5/site-packages/tflearn/summaries.py:46 in get_summary.: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
WARNING:tensorflow:From //anaconda/envs/tflearn/lib/python3.5/site-packages/tflearn/summaries.py:46 in get_summary.: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
WARNING:tensorflow:From //anaconda/envs/tflearn/lib/python3.5/site-packages/tflearn/helpers/trainer.py:766 in create_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge.
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
WARNING:tensorflow:From //anaconda/envs/tflearn/lib/python3.5/site-packages/tflearn/helpers/trainer.py:130 in __init__.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.

Training the network

Now that we've constructed the network, saved as the variable model, we can fit it to the data. Here we use the model.fit method. You pass in the training features trainX and the training targets trainY. Below I set validation_set=0.1 which reserves 10% of the data set as the validation set. You can also set the batch size and number of epochs with the batch_size and n_epoch keywords, respectively.

Too few epochs don't effectively train your network, and too many take a long time to execute. Choose wisely!


In [7]:
# Training
model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=100, n_epoch=8)


Training Step: 3960  | total loss: 0.17724
| SGD | epoch: 008 | loss: 0.17724 - acc: 0.9681 | val_loss: 0.13711 - val_acc: 0.9576 -- iter: 49500/49500
Training Step: 3960  | total loss: 0.17724
| SGD | epoch: 008 | loss: 0.17724 - acc: 0.9681 | val_loss: 0.13711 - val_acc: 0.9576 -- iter: 49500/49500
--

Testing

After you're satisified with the training output and accuracy, you can then run the network on the test data set to measure it's performance! Remember, only do this after you've done the training and are satisfied with the results.

A good result will be higher than 98% accuracy! Some simple models have been known to get up to 99.7% accuracy.


In [8]:
# Compare the labels that our model predicts with the actual labels
predictions = (np.array(model.predict(testX))[:,0] >= 0.5).astype(np.int_)

# Calculate the accuracy, which is the percentage of times the predicated labels matched the actual labels
test_accuracy = np.mean(predictions == testY[:,0], axis=0)

# Print out the result
print("Test accuracy: ", test_accuracy)


Test accuracy:  0.9967

BUT WAIT!

Isn't column zero [:,0] just the prediction for the digit ZERO?

Let's look at our 10000 testY vectors. Each has a one-hot encoding of the digits 0-9.


In [9]:
np.bincount(testY[:,0].astype(np.int_))


Out[9]:
array([9020,  980])

Column zero has 9020 zeros and 980 ones. So about one tenth of our 10000 images was a zero.

Column one is different:


In [10]:
np.bincount(testY[:,1].astype(np.int_))


Out[10]:
array([8865, 1135])

It has over 1000 ones. So there are more ones than zeros in the testY set.

Let's look at ALL the digits:


In [11]:
for i in range(10):
    print(i, np.bincount(testY[:,i].astype(np.int_)))


0 [9020  980]
1 [8865 1135]
2 [8968 1032]
3 [8990 1010]
4 [9018  982]
5 [9108  892]
6 [9042  958]
7 [8972 1028]
8 [9026  974]
9 [8991 1009]

As a check, adding up all the one-hot one's, we should get a total of 10000:


In [12]:
980 + \
1135+ \
1032+ \
1010+ \
982 + \
892 + \
958 + \
1028+ \
974 + \
1009


Out[12]:
10000

Now let's look at our predictions in the same way:


In [13]:
for i in range(10):
    print(i, np.bincount((np.array(model.predict(testX))[:,i] >= 0.5).astype(np.int_)))


0 [9019  981]
1 [8870 1130]
2 [8959 1041]
3 [9029  971]
4 [8985 1015]
5 [9147  853]
6 [9024  976]
7 [8971 1029]
8 [9027  973]
9 [9090  910]

And what about our accuracy test?

This shows how we got 99.67% accuracy. 33 errors out of 10000.


In [14]:
np.bincount(predictions == testY[:,0])


Out[14]:
array([  33, 9967])

But what about the other columns besides zero?

It turns out those other columns (the other digits) have different levels of error.

Up to 1.15% error for the nines.


In [16]:
for i in range(10):
    print(i, np.bincount((np.array(model.predict(testX))[:,i] >= 0.5).astype(np.int_) == testY[:,i]))


0 [  33 9967]
1 [  33 9967]
2 [  67 9933]
3 [  71 9929]
4 [  73 9927]
5 [  59 9941]
6 [  60 9940]
7 [  69 9931]
8 [  83 9917]
9 [ 115 9885]